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ABSTRACT 


In Software Defined Radios a good portion (or even the entirety) of the 
modulation and demodulation processes is performed in the digital domain. The data rate 
of the transmitted information is very important, since efficiency is a key requirement in 
real time implementations and cost increases considerably with the number of samples 
per second to be processed. In this thesis, we address the problem of efficient design of 
the resampling operations, so that they can be implemented on Field Programmable Gate 
Arrays (FPGAs). 

A set of filtering and resampling operations is developed in the Simulink 
environment through Xilinx/Simulink blocksets, where all the included subsystems of the 
design are fully accessible by the designer in any stage of operation. The key ingredient is 
the use of a Multiplier and Accumulator (MAC) architecture, which can be either time 
multiplexed for maximum hardware efficiency, or run on a parallel structure for 
maximum time efficiency. 
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EXECUTIVE SUMMARY 


In Software Defined Radios (SDR) a good portion (or even the entirety) of the 
modulation and demodulation process is performed in the digital domain. The 
reconfigurability and the versatility of the SDR can be efficiently supported by the Field 
Programmable Gate Arrays (FPGAs) for hardware implementations. 

FPGAs are high performance integrated circuits suitable for many Digital Signal 
Processing (DSP) applications with the feature of being reprogrammable by the designer. 
In this way, the system can be easily reconfigured to a number of different applications. 

The proper software needed to program an FPGA is provided by System 
Generator (Sysgen), which is an FPGA design program responsible for driving the FPGA 
through the high-level design environment of Simulink. A combination of common and 
synthesized Simuli nk /Xilinx blocks from the Simulink library along with MATLAB 
codes have been used in order to construct a configurable scheme capable of 
implementing the following three operations: 

a) Finite Impulse Response (FIR) filter 

b) Decimation by an integer factor 

c) Interpolation by an integer factor 

The key ingredient is the use of the Multiplier and Accumulator (MAC) 
architecture, which can be either time multiplexed for maximum hardware efficiency, or 
embedded on a parallel structure for maximum time efficiency. 

The main components of the implementation are the Dual Port Ram Xilinx block, 
which is a random access memory containing both data and the FIR filter coefficients, 
together with the DSP48 Xilinx block, which performs the multiplication and addition on 
a sequential basis. The DSP48 block is specifically designed for high-speed arithmetic 
operations and it is part of the standard Xilinx Virtex family architecture. The objective is 
to perform the proper arrangement of the input data and FIR filter coefficients so that the 
resulting multiplication and accumulation will perform the three examined operations 
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according to the theoretical fonnulations. Since the operations are perfonned serially, the 
data need to be upsampled in order to handle the increased clock rate provided by System 
Generator (Sysgen) and then properly downsampled. 

In this research we have shown that for all three cases (FIR filter, Decimation, 
Interpolation) the overall structure is the same. What defines each operation is the control 
logic (Controller) and the storing of the filter parameters. 

The controller consists of logic blocks from the Xilinx blockset and it is 
responsible for updating the Dual Port Ram’s memory vectors (according to Sysgen clock 
rate) in order to provide the proper dual sequential output. The dual output of the memory 
block is multiplied and accumulated by DSP48 math slice. The outcome of the DSP48 is 
a bitstream in which the desired coefficient of the three examined operations are 
embedded accordingly in multiple of the Sysgen rate. Therefore, the final output can be 
obtained by downsampling the output of DSP48 with the proper factor. 

MATLAB was used to verify the consistency of the simulation with the theory. 
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I. INTRODUCTION 


A. BACKROUND 

In Software Defined Radio, the modulation and demodulation processes are 
performed in the digital domain. The data rate of the transmitted signal is usually several 
orders of magnitudes smaller than the data rate necessary to drive the Digital to Analog 
Converters (DACs) at the radio frequency (RF). In real time implementations, since the 
cost increases according to the number of samples per second, we need to adapt the 
sampling rate to the frequency content of the transmitted signal. Therefore, signals at 
radio frequency (RF) are sampled at a rate comparable to the RF frequency, while the 
signals at baseband are sampled at the information rate [1]. The reconfigurability and the 
versatility of the SDR can be efficiently supported by the Field Programmable Gate 
Arrays (FPGAs) for hardware implementations. 

1. FPGA for Digital Signal Processing 

The Field Programmable Gate Array (FPGA) is a high performance integrated 
circuit suitable for Digital Signal Processing (DSP) applications. An FPGA has the 
feature of being programmable by the designer and it can be easily reprogrammed. 
Physically, an FPGA is a two-dimensional array of gates consisting of various logic DSP 
blocks and interconnections between them in order to perform DSP operations [2], 

Figure 1 shows a Virtex-4 FPGA embedded in a processing board. Figure 2 shows 
a number of important features such as the array of ‘slices’ disposed in columns of 
macroblocks. The latter are blocks, constituted of memory and arithmetic units that are 
programmed to perform suitable operations. The entire interconnected mesh can be 
programmed into highly parallel algorithms [2]. 
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Figure 1. Actual view of FPGA VIRTEX 4 (From: [4]). 



Figure 2. Physical view of FPGA VIRTEX-4 (From: [2]). 
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2. Design Enviroment 

The Xilinx DSP blockset is a suitable tool for designing FPGA algorithms in the 
Mathworks Simulink design environment. This is supported by the System Generator 
(Sysgen), which is a FPGA design program responsible for driving the FPGA through the 
high-level design environment of Simulink. A sufficient number of common and complex 
blocks, which are provided from several blocksets (including the Xilinx blockset) of the 
Simulink Library, are properly synthesized in order to design various DSP applications 
[5], Figure 3 shows on the left the Simulink Library Browser with various basic elements 
of the Xilinx blockset, and, on the right of the same figure, a simple application in 
Simuli nk using Sysgen. Specifically, an input data sequence is loaded from MATLAB’s 
workspace and upsampled by a factor of two. The output is shown on the ‘Scope’ by 
double clicking the corresponding icon. Both ‘in’ and ‘out’ blocks are the interfaces of 
common Simulink blocks with the Xilinx blockset. The entire system is controlled by the 
Sysgen block. The specified parameters of all blocks can be modified by the user when 
the respective icon is selected. 



Figure 3. Simulink environment using Xilinx. 
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B. 


OBJECTIVE 


In this thesis, we address the problem of efficient design of resampling operations 
so they can be implemented on Field Programmable Gate Arrays (FPGAs). The key 
ingredient is the use of a Multiplier and Accumulator (MAC) architecture, which will 
allow us to perform the following operations: 

1) Finite Impulse Response (FIR) filters 

2) Decimation by an integer factor 

3) Interpolation by an integer factor 

The outcome of these three schemes is the development of a set of filtering and 
resampling operations performed in Xilinx/Simulink. All the subsystems in the designs 
are fully accessible by the designer. 

In order to perform the three operations (FIR filtering, Decimation and 
Interpolation by an integer factor), a basic design scheme in the Simulink environment is 
used and is modified accordingly to fit the three cases. Since the objective is to develop 
software suitable to programming FPGAs, a combination of Xilinx and Simuli nk blocks 
as well as MATLAB codes is used. Figure 4 illustrates the basic structure of the 
Simulation. 



Figure 4. Basic Structure of Simulation. 
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In each of the three designs, the proper arrangement of the input data points and 
Finite Impulse Response (FIR) filter coefficients is achieved in the Dual Port Ram Xilinx 
block, which is a random access memory. The dual output of the memory block is 
multiplied and accumulated by the DSP48 Xilinx block, which is an efficient block for 
DSP operations implementing a Multiplier and Accumulator (MAC) operation. From the 
resulting output, we selectively extract the data points of interest according to the 
theoretical formulas of the three desired operations. Although several Xilinx/Simulink 
blocks are used and are explained in the next chapters, the principal blocks are the Dual 
Port Ram and the DSP48. 

1. Efficient Use of the Dual Port Ram and DSP48 Xilinx Blocks 

The Dual Port Ram Xilinx block is a dual memory device that allows the user to 
specify the width and the values of each memory part. This specific block uses two sets 
of ports dedicated to reading and writing of data. Each port has three inputs: (a) the 
address line ‘addr’, (b) the input data ‘din’ and (c) the write enable ‘we’. In addition, each 
port has one output. There is also an option of additional enable and synchronous reset 
inputs for both ports that were not necessary for the purpose of this design. The Dual Port 
Ram Xilinx block, along with its specified parameter window, is shown in Figure 5. 


O Dual Port RAM (Xilinx Dual Port Random Access Memory) fiT|fp][x] 


Basic Advanced Implementation 
Depth |256| 

Initial value vector j [xO,h] 

Memory Type: 

O Distributed memory © Block RAM 

Initial value for port A output register |o 
Initial value for port B output register |o 
Optional Ports 

f~l Provide synchronous reset port for port A output register 
□ Provide synchronous reset port for port B output register 
CH Provide enable port for port A 
C] Provide enable port for port B 
Latency 1 

| OK | | Cancel | Help | | apply 


> addra 

> dina 

> we a 

> addrb 

> dinb 

> web 


A > 


B > 


Dual Port RAM 


Figure 5. Dual Port Ram Xilinx Block. 
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Both memories are accessible for reading and writing by providing the right 
address from the address ports ‘addra’ and ’addrb’. The initial value vector, as it is 
indicated in the parameter window of Figure 5, is the concatenation of the two initial 
vectors (initial data vector x 0 and initial FIR filter coefficients h ). The ‘wea’ and ‘web’ 

are the write enable ports for each memory feeding the Dual Port Ram with a Boolean 
signal ‘0’ or ‘l’.When the ‘we’ port is set to 1 then the memory writes the value of the 
‘din’ port to the location specified by the corresponding address line. Each of the two 
outputs depend on the write mode, which in our case is ‘read after write’, and it takes 
exactly the same value indicted by the address line when the write cycle is completed [5]. 

For the purposes of this thesis, the second part of the memory remains unchanged 
(no input data) and keeps the initial value. Specifically, input b takes the values of 
properly ordered (according to the case of interest) finite impulse response (FIR) filter 
coefficients, which are generated in the initialization of the simulation through any 
MATLAB function such as ‘firpm’. Therefore ports ‘dinb’ and ‘web’ are fed with a 
signed and a boolean zero respectively. On the other hand, the first part of the memory 
changes according to ‘address’ and ‘write enable’ ports. 

The outputs of ports A and B are two signed bit streams: one for the input data 
points and one for coefficients of the FIR filter, aligned in such a way so that their 
multiplication and accumulation will provide us the desired result for the three examined 
cases. 

The DSP48 Xilinx block (also referred as an extreme DSP slice or DSP48 math 
slice) is an efficient tool for many DSP applications, which can handle dynamically many 
operations as well as be cascaded with other DSP48 blocks. It consists of an 18-bit-by- 
18-bit signed multiplier with a 48-bit adder and a programmable multiplexer that can be 
driven as required to perform specific operations [3]. The logic circuit of the slice is 
depicted in Figure 6, while the corresponding Xilinx block along with some capable 
operations is shown in Figure 7. 
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Figure 6. DSP48 Slice (From: [6]). 



Figure 7. DSP48 Xilinx Block. 
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In this thesis the DSP48 is used as a multiplier and accumulator (MAC) block and 
its operation is defined as P = P + A-B. With this block the product of two inputs A and 
B (derived from the Dual Port Ram) is accumulated each time with the previous product 
P . A reset port is available to the slice in order to reset the output every clock cycle to 
produce the desired for each examined case operation. 

C. RELATED WORK 

Although a number of approaches to FIR filtering and resampling operations 
design exist in literature ([10], [11]), to the best knowledge of the author there has been 
no systematic way of designing these filters in a general fashion. 

The main contribution of this resurch is an architecture, which is fully scalable to 
any implementation in terms of filter coefficients and resampling factor. 
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II. FINITE IMPULSE RESPONSE FILTER WITH ONE MAC 
(MULTIPLIER ACCUMULATOR) 


A. THEORETICAL PERSPECTIVE 

In the digital domain, the output sequence y\n] of a Finite Impulse Response 
(FIR) filter is given by the following expression: 

y[n] = Y J h[k]-x[n-k], (2.1) 

k =0 

where h[n] is the impulse response of the filter, x[n] is the input sequence and N being 
the degree of the transfer function of the FIR filter. 

Both x\n\ and y\n] are at the same clock rate F x =F = F s as x[n] = x(nT s ) and 
y[ n ] = y(nT s ), where T s =— is the sampling interval [7]. The discrete convolution, 

F s 

along with its graphical representation, is depicted in Figure 8. 



Figure 8. Discrete Convolution. 
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We can verify from Figure 8 that the convolution operation can be graphically 
implemented as a sliding window over a data sequence. In particular, at any time n we 
need to save N+ 1 data points x[n],x[«-l],...,x[«- N] together with the 

coefficients h [0], h [l ],..h [A] . 

In this chapter, we address the problem of implementing the filtering operation 
using one Multiplier and Accumulator (MAC). In this way, the convolution sum is 
computed in about N clock pulses (where N denotes the degree of the transfer function 
of the FIR filter), thus requiring a higher clock rate to be provided by the System 
Generator, which controls the operation and its parameters. The objective is to perform 
the proper arrangement of the input data points and the filter’s coefficients so that the 
multiplication and accumulation procedure as well as the selective extraction of outcomes 
will give us the desired convolution result in the most efficient way. 

B. SOFTWARE IMPLEMENTATION 

The Simuli nk /Xilinx implementation needed to perform the FIR filtering is shown 
in Figure 9. The main components of the implementation are the Dual Port Ram, which 
contains both data and the FIR filter coefficients and the DSP48, which performs the 
multiplication and addition on a sequential basis. Since the operations are performed 
serially, the data need to be upsampled in order to handle the increase of the clock rate 
provided by System Generator. The controller consists of a set of counters (one for the 
coefficients and one for the data points) along with logic blocks (implemented in Xilinx 
blockset), and controls the flow of the data at the output of the dual Port Ram as well as 
the timing of the operations. 
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Figure 9. Finite Impulse Response Filter with One MAC. 

In order to test the performance of the filter, a Gaussian white noise and a 
sinusoidal signal are selectively available (by a manual switch) as inputs. The input signal 
is sampled at rate F s , while System Generator (Sysgen) works at a higher sampling rate 

equal to (iV + 1) F . Since the new system rate provided by Sysgen is higher, the input 
data is upsampled by the integer factor of N +1 with the corresponding Xilinx block. 

The objective is to achieve a proper alignment of the data and filter’s coefficients, 
so that they can be applied to a MAC resulting in the convolution operation. Towards this 
goal, we need two memory vectors x and h containing the data and the filter coefficients 
respectively provided by the Dual Port Ram and a MAC provided by the DSP48. 

1. Control Logic for Data and Filter Coefficients 

The vector h of the filter coefficients is defined as 

h =[/z[0],/z[l],/z[2]...,/z[A-l],0] . It has length N + I and it remains unchanged during 
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the operation of the filter. Therefore, the ports ‘dinb’ (data input b) and ‘web’ (write 
enable b) are set to false. The first N coefficients of the vector h are generated in 

MATLAB as an FIR filter using function ‘firpm’, while the additional (A + l)^ 

coefficient is intentionally set to zero in order to serve computational issues derived from 
the use of the DSP48, which works as a MAC and will be explained in the MAC 
procedure. 

The input data vector stored in the first part of the memory of the Dual Port Ram 
is a circular shift register of length N, updated at times t = nT s by 

*[(”)*] <— x[n], t = nT s , with (n) N = 0,1,...,7V-1 denoting modulo operation. In the 

implementation, (w) is a periodic counter with update rate F ac = (N + 1) F s . The initial 

value of the memory vector x is set to the initial conditions (say zero for example) and 
updates its value according to the corresponding ‘address’ and ‘write enable’ ports 
provided from the controller. Figure 10 illustrates the controller of this design. 
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Figure 10. Controller. 

The time representation of ‘data address’ and ‘coefficients address’ sequences of 
Figure 10 is shown in Figure 11. In particular, at time (n — \)T s the accumulator is 

initialized by a((n-l)7)) = 0 (where ‘a ’ denotes the content of the accumulation). At 
every subsequent clock cycle T = —— the accumulation will be updated as 

F ac 

a([n-\)T s +XT ac ^=a{j < n- 1)2] +(A-1 )T^+(data_addr[X)»coejf _addr[X)}, 

where X = l,...,N . The output y\n\ at time nT s is shown in the timing diagram of Figure 
11 . 
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Timing Diagram 
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output dt 

itay[n-l] 



T s ={N+\)T ac 





output c 

atay[n] 


Figure 11. Time Representation of Simulation. 

In what follows we demonstrate the functionality of the design according to the 
timing diagram illustrated in Figure 11. 

2. Alignment of Data and Filter Coefficients in the Dual Port Ram 

The length of the vector x is chosen to be one less than the length of h so that 
the writing procedure will introduce a shift by a factor of one in the content of memory 
x . It can be inferred that the outcome of the Dual Port Ram is a set of bitstreams, where 
the output at port A is a recurrent window of length N +1 (in every T s ) in which the 

input data is progressively shifted by one position from left to right, while the bitstream 
of port B is a repetition of the vector h . Figure 12 illustrates the outcome of the Dual 
Port Ram with time running from right to left. 
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Figure 12. Outcome of Dual Port Ram. 

3. Sequential Multiplication and Accumulation (MAC) of Data and 
Filter Coefficients using DSP48 

The output bitstream from the Dual Port Ram, as shown in Figure 12, is being 
processed by the DSP48 Xilinx block, which works as a MAC. Its operation mode is 
defined as P = P + A- B (referring to Figure 7) where the product of two pairs of the Dual 
Port Ram output ports A and B is being accumulated each time with the previous 
product. A reset signal (selected from the DSP48 options) for the outcome P is 
introduced at clock rate N +1 provided from the properly delayed ‘write enable’ signal 
of the controller of the Dual Port Ram (referring to Figure 9). The adjustment of the delay 
is set so that the reset of the outcome P occurs every N +1 times, where a data 
coefficient is multiplied with the zero coefficient of vector h . 
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Consequently, considering the length (N + l) of block pairs in Figure 12, 
whenever a product x\k~\ • /? [0] (with k arbitrarily chosen) is accumulated to the previous 

N sums of products of each block pairs, a data point of the convolution y\n] is 
produced as shown in the timing diagram (Figure 11). For illustration, Figure 13 shows 
the first two points y[0], y[l] computed at times (/V +1) T ac and 2 (/V + 1) f c , 
respectively, by the first two sets of blocks. 


N+1 


N+1 
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i[l],x[0], 0 , 0 , 0 


h[0],h[l],h[2],...,h[N 1],0 






N+1 


After accumulation of all N+1 
products of pairs 


0], 0 0 , 0 

X 

X 

X X 

0],h 

1 1 

| M 

SB 

V 

N-1],0 


N+1 




After accumulation of all N+1 
products of pairs 


[0]xh[l]+x[l>h[0] = y[l] x[0] x h[0] = y [ 0] 


Figure 13. Outcome of DSP48. 

Referring to figure, the bitstream outcome P of the DSP48 can be considered as a 
set of blocks of length N +1 in which the desired convolution coefficients are embedded 

in every (N + 1) ,A element of each block as it shown in the timing diagram in Figure 11. 

Therefore, by downsampling the data P by the factor of N +1 (same factor that was 
used when the input data was upsampled) the desired convolution result is provided. 
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C. RESULTS 

In order to test the perfonnance, an FIR filter was designed and tested with two 
classes of input signals. In particular the FIR filter has been designed as an Equiripple 
Filter with the following characteristics: 

Passband: 0-0.2 (in terms of Digital Frequency /) 

Stopband: 0.3-0.5 (in terms of Digital Frequency / ) 

Order: 60 

The signals tested are a sinusoid and a white noise. The sinusoid has frequency 
F = 0.1 ■ F s (Hz) with sampling frequency F s = 10000 (Hz) and / = F/F s , while the 
white noise is sampled at the same rate. 

The frequency spectrum of the original signal and the resulting filtered signal for 
the sinusoidal case is shown in Figure 14. We can verify that the frequency spectrum of 
the original signal remains the same as long as its frequency is within the passband of the 
FIR filter. 
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Figure 14. Frequency Spectrum of the Original and Filtered Signal (Sinusoidal Case). 

For the Gaussian white noise case the corresponding frequency spectrum, along 
with the frequency spectrum of the fdtered signal, is depicted in Figure 15. We can 
observe that the frequencies of the Gaussian white noise are spread all over the frequency 
spectrum while the frequency spectrum of the corresponding filtered signal maintains the 
frequencies that are within the passband of the FIR filter and eliminates all the others. 
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Figure 15. Frequency Spectrum of the Original and Filtered Signal (Gaussian White 

Noise Case). 
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III. DECIMATION BY AN INTEGER FACTOR 


A. THEORETICAL PERSPECTIVE 

1 Sampling Continuous Time Signals 

It is well known that by the sampling theorem, the sampling frequency F s has to 
be at least twice the signal bandwidth B [7]. The Discrete Time Fourier Transform of a 
sampled signalx[«] with actual frequency content F , which is sampled at rate F s , is 
given by the following expression: 

+QO 

X(f)=DTFT{x[n~]}= £ (3.1) 

i =—oo 

F 

where / is a dimensionless quantity denoting the digital frequency / = —. From 

F 

S 

equation (3.1) we can verify that X (/) is periodic with period one since 

+oo +oo 

*(/ + !)= I x[n]e-^* = X + K' 2 '* = *(/). 

n =—co n=- oo 

Therefore, the infonnation is contained in one period (within the interval 
—1/2</<l/2) of the periodic repetition of the frequency spectrum. Figure 16 
illustrates the frequency spectrum of a continuous time and sampled signal respectively. 
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Figure 16. Sampling Continuous Time Signals. 

2. Analysis of Downsampling (Decimation) 

In digital communications such as Software Defined Radio, the exchange of 
information needs to be done in the most efficient way, in order to reduce complexity and 
improve efficiency while preserving the content of the information. The Downsampling 
operation (Decimation) decreases the number of samples per second of a given signal by 
an integer factor of D . An example of decimation by integer factor of D = 3 is shown in 
Figure 17. 
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Downsampling by an integer factor D=3 


x[n] 






In every D=3 samples of 
x[n] we keep one sample 


Figure 17. Downsampling Operation. 

Consequently, the decimation procedure introduces a loss of information due to 
the elimination of some data points, so we need to be careful in order to preserve the 
necessary information of the signal. Distortion of a signal caused by the downsampling 
operation is in terms of additional frequency components in the frequency spectrum of 
the resampled signal. This phenomenon is called aliasing and it is avoided by properly 
filtering the signal before downsampling [8]. 

When a signal sampled at rate F with frequency spectrum 7f ( j\ j (in terms of 

F s 

digital frequency) is resampled at a lower sampling rate F s = —- (where D is an 

2 D 

integer), the resulting frequency spectrum of the resampling signal is given by the 
following expression [8], 

Nti [n n) 
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(3.2) 


































From equation (3.2) it is easy to show that no aliasing occurs if the signal has no 


ill 1 (f 

frequencies above |/|>-^-, which case equation (3.2) becomes 7(/ 2 ) = — X 2 


r A) 

W 


[9]. Figure 18 illustrates this concept. 



Figure 18. Aliasing Effect in Frequency Spectrum. 


Generally, in order to efficiently downsample a noisy signal by an integer factor 


of D , with information frequency content within the interval 


1 1 


and without 


v 2D 2D j 

introducing aliasing, it is necessary to filter the signal first by the appropriate Low Pass 
Filter (LPF). Therefore, the useful part of the frequency spectrum will be preserved from 
aliased frequencies caused by noise. Figure 19 illustrates this, along with the 
specifications of the appropriate Low Pass Filter (LPF). 
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Figure 19. Filtering and Downsampling a Discrete Signal. 

3. Efficient Implementation of Decimation Operation using Noble 
Identities and Filter’s Polyphase Decomposition 

An efficient way of implementing filtering and downsampling operations is by 
using the Noble identities and the filter’s polyphase decomposition. Since the filter in 
Figure 19 is operated at a higher sampling rate F s , it will be desirable for the filter to be 

placed after the downsampling operation, resulting in a significant decrease of the 
number of operations since F s ^ < F s ^. It is well known that by the polyphase 

decomposition of the filter and the Noble Identities the downsampling operation can be 
implemented as in Figure 20. In particular, the signal is buffered into D components at 
the lower sampling rate and each component is filtered by the polyphased decomposition 
of the Low Pass Filter [8], 
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Figure 20. Efficient Implementation of Decimation. 

B. DECIMATION BY TWO WITH FIR MAC AND POLYPHASE 
DECOMPOSITION 

In case of decimation by an integer factor D = 2 we can relate the input and 
output signal as 

2JV-1 

y[n] = J' i h\k]x\2n - k ], (3-3) 

k =0 

where x[n] = x(nT s ), and T s is the sampling interval. Consequently, the output is 
sampled at half the input rate. 

The FIR filter polyphase decomposition provides two components, one for the 
even samples h 0 [k] = h [2£] and one for the odd samples[&] = /z[2£ + l]. Therefore, 
equation (3.3) can be rewritten as 
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(3.4) 


y[n] = Y J h 0 [k]x[2(n-k)~\ + Y J h i [k] x [2(n-k)-l\, 

k =0 k =0 

which breaks down the computation into two phases associated to the even and odd 
samples, respectively. Equation (3.4) can be rewritten as 

7V-1 N- 1 

y\n\ = h 0 [0]x[2«] + ^h 0 [k]x[2(» -£)] + h l [0]x[2n -1]^^ [k]x[2(« -k)-l]. (3.5) 

k =1 k =1 

Equation (3.5) highlights the fact that, during the time computational interval 
(ln-2)T s <t<(2n)T s the data vector needs to be updated with samples x[2n-l] and 

x[2n], while the data in the two summations are available before time (in -2) T s . 

1. Software Implementation 

The Simuli nk /Xilinx implementation needed to perfonn the decimation-by-two 
has the same structure as the model presented in Figure 4 with modified parameters to 
match this case. Specifically, the initial values of the vectors of the Dual Port Ram along 
with the controller (logic circuit responsible for arranging data points and FIR filter’s 
coefficients) are changed in order to implement equation (3.5). Furthermore, the input 
data is upsampled at a rate equal to the System Generator’s clock rate and the outcome is 
downsampled twice the Sysgen rate, implementing the decimation-by-two operation. 
Figure 21 illustrates the structure of this specific design. 
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Figure 21. Downsampling by Two. 

In order to test the performance of the simulation a sinusoidal signal is provided 
as an input. The input signal is sampled at rate F s while System Generator (Sysgen) 

works at a higher sampling rate equal to NF S , with 2 N - 2 being the degree of the 

transfer function of the FIR filter which is decomposed into its polyphase components. 
The generation of the polyphase filter is accomplished in the initialization of the 
simulation. Since the new system rate provided by Sysgen is higher, the input data is 
upsambled by the integer factor of N with the corresponding Xilinx block. 

The objective is to achieve a proper alignment of the data and filter’s coefficients 
so that they can be applied to a MAC resulting in the decimation-by-two operation. 
Towards this goal, we need two memory vectors x and h , containing the data and the 
filter coefficients provided by the Dual Port Ram, and a MAC provided by the DSP48. 
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a. Control Logic for Data and Filter Coefficients 

The vector h of the filter coefficients is defined as 

h = [/z[0],/z[2],...,/z[27V-2],/z[l],/z[3],...,/z[27V-3],0] and it is the concatenation of 

the two polyphase components (one for the even and one for the odd samples) of a 
2N -1 length FIR filter (which is generated in MATLAB) with an additional zero at the 
end. The vector h has total length 2N and remains unchanged during the operation of 
downsampling-by-two. Therefore the ports ‘dinb’ (data input b) and ‘web’ (write enable 
b) are set to false. The last zero coefficient of vector h is added in order to serve 
computational issues derived from the use of DSP48, which works as a MAC and it will 
be explained in the MAC procedure. 

The input data vector stored in the first part of the memory of the Dual 
Port Ram is a vector x of length 2 N and updated at times t = n T. as 

<— x[2n] for the even samples and X^[n - N) 2n J <— x[2n -l] for the odd 

samples, with (n) 2v , = 0. l,...,2A-2 denoting modulo operation. In the implementation, 

(n) 2x is a periodic counter with update rate F ac = NF s . The initial value of the memory 

vector x is set to the initial conditions (say zero, for example) and updates its value 
according to the corresponding ‘address’ and ‘write enable’ ports provided from the 
controller. Figure 22 illustrates the structure of the controller. 
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Figure 22. Controller. 

The time representation of ‘data address’ and ‘coefficients address’ 
sequences of Figure 22 is shown in Figure 23. In particular, at time [2n-2)T s the 

accumulator is initialized by a((2n -2)7)) = 0 (where ‘a’ denotes accumulation 
function). At every subsequent clock cycle T ac = —the accumulation will be updated by 

F ac 

a((2«-2)7) +AT c )=a((2n-2)T +(A-\}T ii ^+(data_acklr(Aycoeff‘_addr(A )^, 

where A = 1,...,2 N . At time (2n -1)7) = (2n-2)T s +NT ac the input data is updated. The 
output y\n\ at time 2nT. is shown in the timing diagram of Figure 23. 
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Timing Diagram 
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output data y [ n-1 ] 
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Figure 23. Time Representation of Simulation. 

In order to demonstrate the functionality of the implementation, Figure 23 
illustrates the timing of the various signals involved. 

The outcome of the Dual Port Ram is a set of bitstreams, one from port A 
(data points) and one from port B (filter coefficients). It can be inferred that the outcome 
of port A is a recurrent window of length 2N , which is subdivided into two windows 
(one for the even samples and one for the odd samples of input signal) of length N . At 
every time T ac both the even and the odd samples are updated, introducing a shift by one 

position from left to right. The bitstream of port B is a repetition of the vector h . Figure 
24 illustrates the outcome of Dual Port Ram. 
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Figure 24. Outcome of Dual Port Ram. 

b. Sequential Multiplication and Accumulation (MAC) of Data and 
Filter Coefficients using the DSP48 

The output bitstream from the Dual Port Ram as it is shown in Figure 24 is 
being processed by the DSP48 Xilinx block, which works as a MAC. Its operation mode 
is defined by P = P + A-B (referring to Figure 7) where the product of two output pairs 
A and B of the Dual Port Ram, is being accumulated each time with the previous 
product. A reset signal (selected from the DSP48 options) for the outcome P is 
introduced at clock rate 2 N provided from the properly delayed ‘write enable 1 ’ signal 
of the controller of the Dual Port Ram (referring to Figure 22). The adjustment of the 
delay is set so that the reset of the outcome P occurs every 2 N times, where a data 
coefficient is multiplied with the zero coefficient of vector h . 

Consequently, considering the length (2 N) of block pairs in figure 24, 
after the last product x[k] • A [0] is accumulated to the previous 2 N sums of products of 
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each block pair, a data point of decimation-by-two operation y\n] is generated as shown 
also in the timing diagram (Figure 23). Figure 25 shows the first two points v [0], v [l] 
computed at times (2 N)T ac and(4A r )f ac . 


2 N 2AA 





x[2]xh[0] + jc[0]x/i[2]-(-x[l] xfc[l] = j/[l] 


Figure 25. Outcome of DSP48. 

Referring to Figure 7, the bitstream outcome P of the DSP48 can be 
considered as a set of blocks of length 2N in which the desired coefficients of the 

decimation-by-two operation are embedded in every ( 2N^ h element of each block as 

shown in the timing diagram in Figure 23. Therefore, by downsampling the data P by 
the factor of 2N (twice the factor that was used when the input data was upsampled) the 
desired decimation-by-two operation is performed. 
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c. Results 

In order to test the perfonnance of the simulation a sinusoidal waveform 
with frequency F = 0.2 • F s (Hz) and sampling frequency F s = 1 (Hz) is applied as an 
input. 

The FIR filter has been designed as an Equiripple filter and decomposed 
into two polyphase components with the following characteristics: 

Passband: 0-0.3 (in terms of Digital Frequency /) 

Stopband: 0.4-0.5 (in terms of Digital Frequency /) 

Order: 65 

The frequency spectrum of the original and the downsampled-by-two 
signal is shown in Figure 26. We can verify that the frequency spectrum of the 
downsampled-by-two signal is stretched (in terms of the digital frequency) by the integer 
factor of two compared to the frequency spectrum of the original signal. Since the 

bandwidth of the signal is less than there is no aliasing effect. Therefore, the 

frequency of the original signal is / = 0.2 while the frequency of the downsampled-by- 
two signal is / = 2x0.2 = 0.4 (where / is the dimensionless digital frequency). 
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Figure 26. Frequency Spectrum of the Original and Downsampled by Two Signal 

without Aliasing Effect. 

In order to demonstrate the aliasing effect in the frequency spectrum of a 
downsampled signal, a sinusoidal waveform with frequency F = 0.3-F s (Hz) is applied 

as an input. Since the new bandwidth (0.3) exceeds the factor (where D = 2), the 
new frequency spectrum of the downsampled-by-two signal in the interval 

~ f -\ 

will contain aliased frequencies derived from the periodic repetition of one period of the 
frequency spectrum. Figure 27 illustrates this example. 
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Figure 27. Frequency Spectrum of the Original and Downsampled by Two Signal with 

Aliasing Effect. 

C. DECIMATION BY AN INTEGER FACTOR ‘D’ WITH FIR MAC AND 
POLYPHASE DECOMPOSITION 

The structure of the decimation-by-two operation can be easily extended to a 
more general decimation-by- D operation for any D > 2 . The decimated signal obtained 
from an input signal x\n\, which is filtered by a FIR filter h\n] (decomposed into its 
polyphase components) and then downsampled by an integer factor of D is given by 

DN -1 

y[n]= y, h\k\x\nD-k\, (3.6) 

*=o 

with D integer and x\n\ = x(nT s ), y\n\ = y(nDT s ) the input and output sequences 
sampled at rates F s =1/7) and FJD = \t ( DT s ) respectively. 

The D polyphase components of the FIR filter are defined by 
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h e [k\ = h[kD + l], (3.7) 

with £ = 0,...,D-l and k = 0,..., N-l . The decimated output is the superposition of the 
D phases and it is given by the following expression: 

y\n\ = ]^/? 0 \k\c\(n-k}D~\ + 'y'h^ [£]*[(«-A:)D-l] + ... 

k =0 k =0 

N-l 

...+ '£ i h D-i[k]*[(n-k) D - D + l ]. (3.8) 

k =0 

Equation (3.8) can be further decomposed as: 

N-l 

v[«] = ... + h f [0]x[nD - £] + ^^ [A']r[(n -£)Z)-f] + ... (3.9) 

i=i 

During the time computational interval ( Dn-D)T S <t<(Dn ) T s the data vector 
needs to be updated with samples x\Dn-(D- 1)] up to x[Z)«], while the data in the D 
summations are available before time (Dn — D)T S . 

The design needed to perform the decimation-by- D operation is similar to the 
decimation-by-two case. The memory vector for the input data points in the Dual Port 
Ram has length DN -1 and updates its value by 

x [( n ) D N -1 ]<-*[«£>], 

X [( n - N ) D N-^ X [ nD - 1 ]’ 

x [(n-W) mi ]^ x [ nD -q, 


xUn-(D-l)N) 


<— x\nD - D + \\. 


The FIR filter coefficients vector, which is stored in the second memory of the 
Dual Port Ram is the concatenation of its polyphase components derived from expression 
(3.7) with total length DN - 1. 
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In the implementation, (n) 1 =0 ,...,DN-2 is a periodic counter with update 

rate F ac = NF s , which is the clock rate of the System Generator. Therefore the input data 
is upsampled by the integer factor of N . 

The time representation of ‘data address’ and ‘coefficients address’ sequences are 
shown in Figure 28. In particular, at time ( Dn-D)T s the accumulator is initialized as 

a (( Dn - D)T \ = 0 (where ‘ a ’ denotes the accumulation function). At every subsequent 
clock cycle T = —— the accumulation will be updated by 

F ac 

a((Dn-D)T + AT ac ] = a((Dn-D ) jT +(/L-l)r)+( data _ addr (Aycoeff _ addr (A)), 


where A = l,...,DN. The input data is updated every N ,h multiple of T ac with total 
multiples DN. In particular, (Dn - D + 1) T = (Dn - D) T. + NT ac . The output y\n\ at 
time DnT s is shown in the timing diagram of Figure 28. 



Figure 28. Timing Diagram for Decimation by D. 
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Apart from the new vectors that are stored in the Dual Port Ram, this design can 
be obtained by simple extension of the decimation-by-two case to the more general 
decimation-by-D. 

Referring to Figure 7, the bitstream outcome P of the DSP48 can be considered 
as a set of blocks of length DN in which the desired coefficients of the decimation-by-D 

operation are embedded in every (ZW) * element of each block as shown in the timing 

diagram in Figure 28. Therefore by downsampling the data P by the factor of DN (D 
times the factor which was used when the input data was upsampled) the desired 
decimation-by-D operation is performed. 

In order to test the performance of the simulation for the decimation factor D = 4 
a sinusoidal waveform with frequency F = 0.1 • F s (Hz) and sampling frequency F = 1 
(Hz) is applied as an input. 

The FIR filter has been designed as an Equiripple filter and decomposed into four 
polyphase components with the following characteristics: 

Passband: 0-0.2 (in terms of Digital Frequency /) 

Stopband: 0.25-0.5 (in terms of Digital Frequency / ) 

Order: 29 

The frequency spectrum of the original and the downsampled by D = 4 signal is 
shown in Figure 29. We can verify that the frequency spectrum of the downsampled 
signal is stretched (in terms of the digital frequency) by the integer factor of four 
compared to the frequency spectrum of the original signal. Since the initial bandwidth of 

the signal is less than there is no aliasing effect. Therefore, the frequency of the 

original signal is / = 0.1, while the frequency of the decimation-by-four signal is 
/ = 4x0.1 = 0.4, where / is the dimensionless digital frequency. 
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Figure 29. Frequency Spectrum of the Original and Downsampled by D = 4 Signal. 
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IV. INTERPOLATION BY AN INTEGER FACTOR 


A. THEORETICAL PERSPECTIVE 

1. Analysis of Upsampling (Interpolation) 

In Software Defined Radios (SDR), the modulation process is perfonned in the 
digital domain. The data rate of the transmitted information needs to be increased in order 
to match the rate of the modulation (carrier frequency). An upsample operation 
(interpolation) increases the number of samples per second of a given signal by an integer 
factor D . An example of interpolation by integer factor of' D = 3 is shown in Figure 30. 


Upsampling by an integer factor D=3 


X 


[»] 



y[ m \ 


= DF 

s l 


X 


M 


n 


y[ m \= 



., m 

— is integer 


, Otherwise 



For each sample add two more samples 


Figure 30. Upsampling Operation. 


41 


















When a signal sampled at a rate F s ^ with frequency spectrum X ( f\ ) (in terms of 

digital frequency) is resampled at a higher rate F Sr> = DF S , where D is an integer, the 

resulting frequency spectrum of the resampled signal is given by the following 
expression: 

Y(A) = Y(f t )\ f ^. (4.1) 


It is obvious from equation (4.1) that the new frequency spectrum is ‘squeezed’ in 
terms of the digital frequency (horizontal axis) [8], Consequently, since the frequency 
spectrum of the resampled signal is a periodic repetition of one period between the 


interval 


]_ f 
2 ’ 2 


additional image frequency components (‘ghost’ frequencies) will 


appear in the spectrum of the upsampled signal. These frequencies are artifacts created by 
the upsampling operation. The frequency spectra of the original signal and after 
upsampling by D is shown in Figure 31. 
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In order to eliminate the ‘ghost’ frequencies a Low Pass Filter (LPF) is needed 
after the upsampling operation. The frequency response of the LPF along with its 
specifications is depicted in Figure 32. 



Figure 32. Upsampling and Filtering with LPF. 

2. Efficient Implementation of Interpolation Operation using Noble 
Identities and Filter’s Polyphase Decomposition 

An efficient way of implementing upsampling and filtering operations is by using 
the Noble identities with the filter’s polyphase decomposition. Since the filter in Figure 
32 is operated at a higher sampling rate F s it would be desirable for the filter to be 

placed before the upsampling operation, thus minimizing the cost. It can be shown that 

N 

the upsampling operation shown in Figure 32, with the LPF H (z) = y\h(n)z ~ n , can be 

n =0 
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implemented as shown in Figure 33, where the filters H k (z), for k = D -1 are the 

N/D 

polyphase components of H k (z) = Z h(nD + k]z~ n [8]. 

n= 0 


The upsampling network on the right of Figure 33, after the filters, is an 
‘interlaced that interlaces the outputs of all D filters, thus increasing the sampling rate. 
This implementation is particularly attractive, since it has the same complexity as the 
original but is implemented at the lowest sampling rate [8], 





LPF 


y[ m \ 


X 





Interlacer 


Figure 33. Efficient Implementation of Interpolation. 
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B. INTERPOLATION BY TWO WITH FIR MAC AND POLYPHASE 

DECOMPOSITION 

From the polyphase decomposition the upsampling by two is determined as 

N -1 

y[2 n ] = Y J h 0 [k]x[n-k], 

k=0 ( 42 ) 

TV-1 V 7 

v[2 n + l] = y i j h l [k]x [n — k\. 

k =o 

Here A 0 [k] = /z[2k] and h x \k\ = h[2k + l] are the polyphase components (even and odd 
samples) of the filter h\n\ while x [ /? ] = x(nT ) with T s the sampling interval. 
Consequently, the output rate is twice the input rate. 

Equation (4.2) highlights the fact that the signal x[n] is interpolated by 
interlacing two signals, y\2n] and y [2n +1], which are computed independently. 

1. Software Implementation 

The Simulink/Xilinx implementation needed to perfonn the interpolation-by-two 
has the same structure as the model presented in Figure 4 with parameters properly 
chosen to match the new case. Specifically the initial values of the vectors of the Dual 
Port Ram along with the controller (logic circuit responsible for arranging data points and 
FIR filter’s coefficients) are changed in order to implement equation (4.2). Furthermore, 
the input data is upsampled at a rate equal to the System Generator’s clock rate and the 
outcome is downsampled at the half of the Sysgen rate, implementing the interpolation- 
by-two operation. Figure 34 illustrates this design. 
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Figure 34. Upsampling by Two. 

In order to test the performance of the simulation, a sinusoidal signal is provided 
as an input. The input signal is sampled at rate F s , while the System Generator (Sysgen) 

works at a higher sampling rate equal to 2 (N +1) F s , with IN -2 being the degree of the 

transfer function of the FIR filter which is decomposed into its polyphase components. 
The generation of the polyphase filter is accomplished in the initialization of the 
simulation. Since the new system rate provided by Sysgen is higher, the input data is 
upsambled by the integer factor of 2[N +1) with the corresponding Xilinx block. 

The objective is to achieve a proper alignment of the data and filter’s coefficients 
so that they can be applied to a MAC resulting in the interpolation-by-two operation. 
Towards this goal, we need two memory vectors x and h containing the data and the 
filter coefficients respectively provided by the Dual Port Ram and a MAC provided by 
the DSP48. 


46 















































































































a. Control Logic for Data and Filter Coefficients 
The vector h of the filter coefficients is defined as 
h = [/z[0],/z[2],...,/z[27V-2],0,0,/z[l],/z[3],...,/z[27V-3],0] and it is the concatenation 

of the two polyphase components (one for the even and one for the odd samples) of a 
27V-1 length FIR filter (which is generated in MATLAB) with three properly placed 
additional zeros. The length of the vector h is 2TV + 2 and its value remains unchanged 
during the operation of upsampling-by-two. Therefore the ports ‘dinb’ (data input b) and 
‘web’ (write enable b) are set to false. The zero coefficients are required by 
computational issues derived from the controller and the use of the DSP48 block which 
works as a MAC and it will be explained in the MAC procedure. 

The input data vector stored in the first part of the memory of the Dual 
Port Ram is a vector x of length TV and it is updated as x[(«) v ] <— x[n \. The data 

address counter is defined as ( n) = 0,1,...,7V-1 and it is repeated twice during the 

sampling interval T s . The initial value of the memory vector x is set to the initial 

conditions (say zero, for example) and updates its value according to the corresponding 
‘address’ and ‘write enable’ ports provided from the controller. Figure 35 illustrates the 
controller of the simulation. 
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Figure 35. Controller. 

The time representation of ‘data address’ and ‘coefficients address’ 
sequences of Figure 35 are shown in Figure 36. In particular, at time (2«-2)r the 

accumulator is initialized as a((2n -2)T s ^ = 0 (where a denotes the content of the 
accumulation). At every subsequent clock cycle T ac = ——, the accumulation will be 

F ac 

updated by 

a([2n-2)T s +AT ac ^)=a((2n-2)T s + (2-1 )F r j+( data_addr ( X)»coeff_addr (2)), and 
a( y {2n-\ s jT s + AT ac ) = a ((2« -1) 7] + (2 -1) 7^ c )+( data _ addr ( X)*coeff _ addr (2)), 
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with A, = . The outputs y[2«-l],y[2»] at times (2n-i)T s and 2nT s respectively 

are shown in the timing diagram of Figure 36. . 


Timing Diagram 


inputdatax n-1 


input data x n] 


Data addr (n-l) N (n-N) N ... (n-l) N (n-N) N (n-N) N ... (n-l) N (n), 
Coeff addr 0 2N+1 . . . N+2 N+l N 1 0 

Phase 0 N ...1 0 N... 10 


'N 








time 


-— T —J 

ac 


2n-l 




T=(^+i)T, 


T s =(N + l )T X 


output data y [ 2n- 2j 


output data y [ 2 n-1 ] 


output data y [ 2 nJ 


Figure 36. Time Representation of Simulation. 

In order to demonstrate the functionality of the implementation, Figure 36 
illustrates the timing of the various signals involved. 

The outcome of the Dual Port Ram is a set of bitstreams, one from port A 
(data points) and one for port B (fdter’s coefficients). The bitstream of port B is a 
repetition of the vector h . It can be inferred that the outcome of port A is a recurrent 
window of length 2N + 2, which is subdivided into two windows of length N + l . At 
every time T ac both subwindows are updated with the same data, introducing a shift by 

one position from left to right, while the first subwindow starts updating from the second 
sample. Figure 37 illustrates the outcome of the Dual Port Ram. 
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Initial memory vector [x,h]= [0,...,0,/2[0],fc[2],...,/i[2A/ r -2],0,0,fc[l],fc[3],...,fc[2A/'-3],0 



N +1 


N +1 

A 


JV + 1 N + \ 

A A 

V *1 


x[l], x[0],.0,0, x[0],..., 0, x[0],..., 0,0,...,0 


2JV+2 

T - 


Y 

2JV+2 


JV + 1 


JV + 1 


B> h[0],h[2],..,h[2N-2],0,0,h[l],h[3],...,h[2N-3],0 


Dual Port RAM 


IN+2 

T ~ 


Figure 37. Outcome of Dual Port Ram. 

b. Sequential Multiplication and Accumulation (MAC) of Data and 
Filter Coefficients using DSP48 

The output bitstream from the Dual Port Ram, as it is shown in Figure 37, 
is being processed by the DSP48 Xilinx block, which implements the MAC. Its operation 
mode is defined by P = P + A- B (referring to Figure 7), where the product of two pairs 
of Dual Port Ram output ports A and B is being accumulated each time with the previous 
product. A reset signal (selected from the DSP48 options) for the outcome P is 
introduced at clock rate N +1 provided from the properly delayed ‘write enable 1 ’ signal 
of the controller of the Dual Port Ram (referring to Figure 35). The adjustment of the 
delay is set so that the reset of the outcome P occurs every N +1 samples, where the 
coefficients of vector h are zero, without affecting the accumulation procedure of the 
interpolation-by-two operation and therefore there is no loss of information. The third 
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zero coefficient of vector h which is placed at the first odd sample is not affecting the 
accumulation process of the interpolation-by-two operation as well, since the first 
subwindow of the data vector is updated from the second sample. 

Consequently, considering the length (27V+ 2) of block pairs in Figure 
37, whenever a product x[k] •/;[()] is accumulated to the previous 27V+ 1 sums of 
products of each block pair, two data points of interpolation-by-two operation y\n\ are 
provided at every (7V + l)F. interval as it also shown in the timing diagram in Figure 36. 
Figure 38 shows the first three points v[0] ,y[l] , y [2] provided at time 
ju(N + \)T ac ,where ju = 1,2,3. 



Figure 38. Outcome of DSP48. 
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Referring to Figure 7, the bitstream outcome P of the DSP48 can be 
considered as a set of blocks of length 2N + 2 in which the desired coefficients of the 

interpolation-by-two operation are embedded in every (/V +1 j'" and (2.N + 2'f ! element 

of each block as it shown in the timing diagram in Figure 36. Therefore by downsampling 
the data P by the factor N +1 (half the factor that was used when the input data was 
upsampled) the desired interpolation-by-two operation is perfonned. 

c. Results 

In order to test the perfonnance of the simulation a sinusoidal waveform 
with frequency F = 0.4-F s (Hz) and sampling frequency F s =1 (Hz) is applied as an 
input. 

The FIR filter has been designed as an Equiripple filter and decomposed 
into two polyphase components with the following characteristics: 

Passband: 0-0.4 (in terms of Digital Frequency / ) 

Stopband: 0.45-0.5 (in terms of Digital Frequency / ) 

Order: 31 

The frequency spectrum of the original and the upsampled-by-two signal 
is shown in Figure 39. We can verily that the frequency spectrum of the upsampled-by- 
two signal is squeezed (in terms of the digital frequency) by the integer factor of two 
compared to the frequency spectrum of the original signal. Therefore, the frequency of 
the original signal is / = 0.4 while the frequency of the upsampled-by-two signal is 
/ = 0.4/2 = 0.2 (where / is the dimensionless digital frequency). 
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Figure 39. Frequency Spectrum of the Original and Upsampled by Two Signal. 

C. INTERPOLATION BY AN INTEGER FACTOR ‘D’ WITH FIR MAC AND 
POLYPHASE DECOMPOSITION 

The structure introduced for the interpolation-by-two operation can be easily 
extended to a more general interpolation-by- D operation for any D > 2 . The interpolated 
signal y\n\ obtained from an input signal x\n \, which is upsampled by an integer factor 

D and then filtered by a FIR filter h[n] (decomposed into its polyphase components) is 
given from the following expression: 


JV—1 

y[ Dn ] = Yjh 0 [k]x[n-k], 

k =0 

N -1 

v [Dn +1] = X h \ [k\c[n -k], 

k =0 

• • • 

N -1 

y [Dn + D - 1] = X K-\ [ k ]x [n - k ), 

k =0 


(4.3) 
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with D integer and x[n] = x(nT s ) , y\ri\ = y(nTJ D) the input and output sequences 
sampled at rates F s = 1 / T s and DF s =D/T s respectively. 

The D polyphase components of the FIR filter are defined as: 

h t [k] = h[kD + £] (4.4) 

with £ = and k = 0,..., N-l. 

The simulation needed to perform the interpolation-by- D operation is similar to 
the interpolation-by-two case. The memory vector for the input data points in the Dual 
Port Ram has length DN and it is updated as ^\_( n ) DN ] <— . The data address 

counter is defined as («) = DN — D — \ and it is repeated D times during the input 

interval T s . 

The FIR filter coefficients vector, which is stored in the second memory of the 
Dual Port Ram, is made of the polyphase components derived from expression (4.4) with 
total length D(N + 1). 

The input signal is sampled at a rate F , while System Generator (Sysgen) works 
at a higher sampling rate equal to D (/V + 1) F . Therefore the input data is upsampled by 
the integer factor of F>( N + 1) with the corresponding Xilinx block 

The time representation of ‘data address’ and ‘coefficients address’ sequences are 
shown in Figure 40. In particular, at time [Dn-D^T s the accumulator is initialized as 

a{^Dn-D)T s ^ = 0 (where ‘ a ’ denotes the accumulation function). At every subsequent 
clock cycle T ac = —— the accumulation will be updated by 
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a[iyDn-D^jT s + AT ar j =a{( Dn - D) T s +( A — l) T ac j +[data _ addr ( X)»coeff _ addr ( /.) j , 


a([Dn-l)T s + AT a ^ = a[( y Dn-l)T s +{ y A-\)T ai ^+(data _addr^X)»coeff _addr[X^, 


a(^Dn-l)T s +AT ac ^ = ai y [Dn-\)T s +( y A-\^T ac ^+{ y data _addr{ y X)»coejf _addr( y X^j, 

where A = 1 ,...,N. The outputs y[Dn-D] ,...,y[Dn ] are shown in the timing diagram of 
Figure 40. 


Timing Diagram 


inputdatax «-l] 


input data x[«] 


Data addr (n-ENj^. . .(n-l)^ (n-EN^. - .(n£N) m ( n £N) m . . . (nCNj^.. .(iH)Jn) m 

Coeff addr 0 ENH... (£>-l)N+-2 (Z^l)Ntl...^H-l . . . (/-1)NH. . . 1 0 

Phase 0 N... 1 0...0 N... 0...1 0 







*—T -J 

* £ ac ^1 

time 



Dn-D * * * Dn-£ 



Figure 40. Timing Diagram for Interpolation by D. 

Apart from the new vectors that are stored in the Dual Port Ram, this design can 
be obtained by simple extension of the interpolation-by-two case to the more general 
interpolation-by-D. 

Referring to Figure 7, the bitstream outcome P of the DSP48 can be considered 
as a set of blocks of length N +1 in which the desired coefficients of the interpolation- 

by-D operation are embedded in every (A + 1) /A element of each block as it shown in the 
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timing diagram in Figure 40. Therefore by downsampling the data P by the factor of 
N + l (D times less the factor which was used when the input data was upsampled) the 
desired interpolation-by- D operation is performed. 

In order to test the performance of the simulation for interpolation factor ,d = 4 a 
sinusoidal waveform with frequency F = 0.4 • F s (Hz) and sampling frequency F s = 1 
(Hz) is applied as an input. 

The FIR fdter has been designed as an Equiripple fdter and decomposed into four 
polyphase components with the following characteristics: 

Passband: 0-0.2 (in terms of Digital Frequency /) 

Stopband: 0.25-0.5 (in terms of Digital Frequency / ) 

Order: 120 

The frequency spectrum of the original and the upsampled by 0 = 4 signal is 
shown in Figure 41. We can verify that the frequency spectrum of the upsampled by 
D = 4 signal is squeezed (in terms of the digital frequency) by the integer factor of four 
compared to the frequency spectrum of the original signal. Therefore, the frequency of 
the original signal is / = 0.4, while the frequency of the upsampled-by-four signal is 
/ = 0.4/4 = 0.1 (where / is the dimensionless digital frequency). 
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Figure 41. Frequency Spectrum of the Original and Upsampled by D = 4 Signal. 
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V. CONCLUSIONS 


A. SUMMARY OF THE WORK 

In this research, we presented an architecture for implementing resampling 
operations in FPGAs. The particularly interesting feature of this approach is the use of a 
specific functional block (the DSP48), which is optimized for DSP applications in real 
time. Although a number of applications are possible, this approach is particularly 
attractive in the implementation of Software Defined Radios (SDR). 

Three classes of DSP operations have been implemented software in the Simulink 
design environment: 

1) Finite Impulse Response filter 

2) Decimation by an integer factor 

3) Interpolation by an integer factor 

All subsystems of the design are fully accessible by the designer at every stage. 

The key ingredient was the use of a Multiplier and Accumulator (MAC) 
architecture carried out from the DSP48 slice, which is an efficient Xilinx block (from 
the Simulink library) for many DSP applications. The dual input fed to the DSP48 was 
provided from the Dual Port Ram Xilinx block, which is a memory device that allows the 
user to specify the width and the values for each memory part in order to perform the 
three above mentioned operations. 

The Xilinx System Generator was used to realize the software performance to a 
Virtex-4 FPGA, increasing the computation data rate according to each case. 

MATLAB code was used to generate the FIR filter and its polyphase 
decomposition in the design and also to verify the perfonnance providing the desired 
results in terms of plots demonstrating the corresponding theoretical perspective. 
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B. SUGGESTION FOR FUTURE WORK 

The designs presented in this thesis will be part of a general Software Defined 
Radio (SDR) implementation. In particular it will be interlaced with both modulation and 
demodulation processes, so that the whole radio will be implemented in software. 

There are several issues to be addressed. The most important is whether this 
approach can be implemented in real time using a reasonable amount of chip “real 
estate”. In order to address this problem, higher-level language code needs to be used to 
implement the algorithm on the chip. 

This is part of an ongoing research project. 
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